What is bioinformatics?

Bioinformatics


A scientific subdiscipline that involves using computer technology to collect, store, analyze and disseminate biological data and information ~ NIH



DNA Sequence

Gene interactions

Electronic Health Records

“Omics”: large-scale datasets

Multi-omics

Multi-omics

Programming or Online tools

Programming languages Python, R

NCBI BLAST Graphic User Interface GUI

DNA common file formats

“.fasta”

Two line files, with > indicating the label for the following sequence.

>First line is "header" information, second line is sequence
GCGAGGCAGCCAGCGAGGGAGAGAGCGAGCGGGCG

“.fastq”

Four line files, with @ indicating the label for the following sequence.

Forth line is quality scores for each DNA base.

@Four line file, head, sequence, +, quality
TCGCACTCAACGCCCTGCATATGACAAGACAGAATC
+
<>;##=><9=AAAAAAAAAA9#:<#<;<<<????#=

Example

SERPINB4 gene variant identified in twin patients with Crohn’s disease

SERPINB4 gene variant identified in twin patients with Crohn’s disease

Example

Analyses of transcriptomic candidate biomarkers associated with RA

Analyses of transcriptomic candidate biomarkers associated with RA

Find more examples: Add to padlet

Search for information or studies on the different omics and add them to the padlet.

Link to padlet

QR Code Link to padlet

QR Code Link to padlet

Sequencing platforms

How does a DNA sequencing machine work?

Sequencing platform comparison

Illumina Short, paired reads (~300 bp) Accuracy above 99.9% Illumina sequencer

Oxford Nanopore Long reads, maximum over 4 million bases (4 Mb) Error rate 1-5% Nanopore technologies

PacBio Long reads, 10-25 thousand bases (10-25 Kb) Accuracy above 99.5% PacBio machine

Cost of sequencing

Graph of decreased sequencing cost since human genome

Cost of human genome sequencing

Cost of human genome sequencing

Genomics and AI

  • Rapid progress in both fields
  • AlphaFold (2018) used AI to revolutionise protein-folding predictions
  • GenomicsEngland podcast

Timeline of Genomics and AI Advancements

Timeline of Genomics and AI Advancements

GUI Tools

Click each link to preview

RAP: Reproducible Analytical Pipelines

  • Automated statistical and analytical processes
  • Minimising manual steps through automation
  • Using open-source languages
  • Peer review of code
  • Version control
  • Documentation

RAP illustration

RAP illustration

Clinical applications: Personalised medicine

Whole Genome Sequencing in paediatric cancer - experience and feedback from two families

NHS Roles

NHS Roles - Clinical bioinformatics

NHS Roles Screenshot

NHS Roles Screenshot

Programming support communities

NHS PyCom Logo

NHS PyCom Logo


NHS-Python Community

RainbowR logo

RainbowR logo


RainbowR

Quarto

  • Markdown editor
  • Presentations
  • Reports
  • Books

This slide:

## Quarto 

- Markdown editor
- Presentations 
- Reports
- Books 

See more at Quarto

Summary

  • Genome sequencing creates large datasets
  • Bioinformatics needed to find patterns in big data
  • Many different types of tools
  • Analyses should be reproducible
  • Real-world clinical applications